Enhancing energy minimization framework for scene text recognition with top-down cues

نویسندگان

Anand Mishra

Karteek Alahari

C. V. Jawahar

چکیده

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. This problem has gained significant attention from the computer vision community in recent years, and several methods based on energy minimization frameworks and deep learning approaches have been proposed. In this work, we focus on the energy minimization framework and propose a model that exploits both bottom-up and top-down cues for recognizing cropped words extracted from street images. The bottom-up cues are derived from individual character detections from an image. We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them. These interactions are top-down cues obtained from a lexicon-based prior, i.e., language statistics. The optimal word represented by the text image is obtained by minimizing the energy function corresponding to the random field model. We evaluate our proposed algorithm extensively on a number of cropped scene text benchmark datasets, namely Street View Text, ICDAR 2003, 2011 and 2013 datasets, and IIIT 5K-word, and show better performance than comparable methods. We perform a rigorous analysis of all the steps in our approach and analyze the results. We also show that state-of-the-art convolutional neural network features can be integrated in our framework to further improve the recognition performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding Text in Scene Images

With the rapid growth of camera-based mobile devices, applications that answer questions such as, “What does this sign say?" are becoming increasingly popular. This is related to the problem of optical character recognition (OCR) where the task is to recognize text occurring in images. The OCR problem has a long history in the computer vision community. However, the success of OCR systems is la...

متن کامل

Recognizing Text-Based Traffic Guide Panels with Cascaded Localization Network

In this paper, we introduce a new top-down framework for automatic localization and recognition of text-based traffic guide panels captured by car-mounted cameras from natural scene images. The proposed framework involves two contributions. First, a novel Cascaded Localization Network (CLN) joining two customized convolutional nets is proposed to detect the guide panels and the scene text on th...

متن کامل

Stereo reconstruction using top-down cues

We present a framework which allows standard stereo reconstruction to be unified with a wide range of classic top-down cues from urban scene understanding. The resulting algorithm is analogous to the human visual system where conflicting interpretations of the scene due to ambiguous data can be resolved based on a higher level understanding of urban environments. The cues which are reformulated...

متن کامل

Attention and the Minimal Subscene

We describe a computational framework that explores the interaction between focal visual attention, the recognition of objects and actions, and the related use of language. We introduce the notions of "minimal subscene" and “anchored subscene” to provide a middle ground representation, in which an agent is linked to objects or other agents via some action. We offer a preliminary model of visual...

متن کامل

Informing multisource decoding in robust automatic speech recognition

Listeners are remarkably adept at recognising speech in natural multisource environments, while most Automatic Speech Recognition (ASR) technology fails in these conditions. It has been proposed that this human ability is governed by Auditory Scene Analysis (ASA) processes, in which a sound mixture is segregated into perceptual packages, called ‘streams’, by a combination of bottom-up and top-d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Computer Vision and Image Understanding

دوره 145 شماره

صفحات -

تاریخ انتشار 2016

Enhancing energy minimization framework for scene text recognition with top-down cues

نویسندگان

چکیده

منابع مشابه

Understanding Text in Scene Images

Recognizing Text-Based Traffic Guide Panels with Cascaded Localization Network

Stereo reconstruction using top-down cues

Attention and the Minimal Subscene

Informing multisource decoding in robust automatic speech recognition

عنوان ژورنال:

اشتراک گذاری